81 research outputs found

    Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures

    Get PDF
    Self-contained structured domains of RNA sequences have often distinct molecular functions. Determining the boundaries of structured domains of a non-coding RNA (ncRNA) is needed for many ncRNA gene finder programs that predict RNA secondary structures in aligned genomes because these methods do not necessarily provide precise information about the boundaries or the location of the RNA structure inside the predicted ncRNA. Even without having a structure prediction, it is of interest to search for structured domains, such as for finding common RNA motifs in RNA-protein binding assays. The precise definition of the boundaries are essential for downstream analyses such as RNA structure modelling, e.g., through covariance models, and RNA structure clustering for the search of common motifs. Such efforts have so far been focused on single sequences, thus here we present a comparison for boundary definition between single sequence and multiple sequence alignments. We also present a novel approach, named RNAbound, for finding the boundaries that are based on probabilities of evolutionarily conserved base pairings. We tested the performance of two different methods on a limited number of Rfam families using the annotated structured RNA regions in the human genome and their multiple sequence alignments created from 14 species. The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method. The actual performance of the two methods differs on single hairpin structures and branched structures. For the RNA families with branched structures, including transfer RNA (tRNA) and small nucleolar RNAs (snoRNAs), RNAbound improves the boundary predictions using multiple sequence alignments to median differences of −6 and −11.5 nucleotides (nts) for left and right boundary, respectively (window size of 200 nts)

    研究課題

    Get PDF
    An Excel file containing information on lncRNAs probed for FM bias. The first sheet contains a README file with details on the contents of the other two sheets. The second sheet contains the list of all lncRNAs probed, a summary of their biological function, and the original source from which we extracted them. The third sheet contains the FM bias p value computed for each significantly FM biased lncRNA in each cohort. (ODS 13 kb

    Somatic and Germline Mutation Periodicity Follow the Orientation of the DNA Minor Groove around Nucleosomes

    Get PDF
    Mutation rates along the genome are highly variable and influenced by several chromatin features. Here, we addressed how nucleosomes, the most pervasive chromatin structure in eukaryotes, affect the generation of mutations. We discovered that within nucleosomes, the somatic mutation rate across several tumor cohorts exhibits a strong 10 base pair (bp) periodicity. This periodic pattern tracks the alternation of the DNA minor groove facing toward and away from the histones. The strength and phase of the mutation rate periodicity are determined by the mutational processes active in tumors. We uncovered similar periodic patterns in the genetic variation among human and Arabidopsis populations, also detectable in their divergence from close species, indicating that the same principles underlie germline and somatic mutation rates. We propose that differential DNA damage and repair processes dependent on the minor groove orientation in nucleosome-bound DNA contribute to the 10-bp periodicity in AT/CG content in eukaryotic genomes

    A splice-site variant in the lncRNA gene RP1-140A9.1 cosegregates in the large Volkmann cataract family

    Get PDF
    Purpose: To identify the mutation for Volkmann cataract (CTRCT8) at 1p36.33. Methods: The genes in the candidate region 1p36.33 were Sanger and parallel deep sequenced, and informative single nucleotide polymorphisms (SNPs) were identified for linkage analysis. Expression analysis with reverse transcription polymerase chain reaction (RT-PCR) of the candidate gene was performed using RNA from different human tissues. Quantitative transcription polymerase chain reaction (qRT-PCR) analysis of the GNB1 gene was performed in affected and healthy individuals. Bioinformatic analysis of the linkage regions including the candidate gene was performed. Results: Linkage analysis of the 1p36.33 CCV locus applying new marker systems obtained with Sanger and deep sequencing reduced the candidate locus from 2.1 Mb to 0.389 Mb flanked by the markers STS-22AC and rs549772338 and resulted in an logarithm of the odds (LOD) score of Z = 21.67. The identified mutation, rs763295804, affects the donor splice site in the long non-coding RNA gene RP1–140A9.1 (ENSG00000231050). The gene including splice-site junctions is conserved in primates but not in other mammalian genomes, and two alternative transcripts were shown with RT–PCR. One of these transcripts represented a lens cell–specific transcript. Meta-analysis of the Cross-Linking-Immuno-Precipitation sequencing (CLIP-Seq) data suggested the RNA binding protein (RBP) eIF4AIII is an active counterpart for RP1–140A9.1, and several miRNA and transcription factors binding sites were predicted in the proximity of the mutation. ENCODE DNase I hypersensitivity and histone methylation and acetylation data suggest the genomic region may have regulatory functions. Conclusions: The mutation in RP1–140A9.1 suggests the long non-coding RNA as the candidate cataract gene associated with the autosomal dominant inherited congenital cataract from CCV. The mutation has the potential to destroy exon/intron splicing of both transcripts of RP1–140A9.1. Sanger and massive deep resequencing of the linkage region failed to identify alternative candidates suggesting the mutation in RP1–140A9.1 is causative for the CCV phenotype

    Functional analysis of structural variants in single cells using Strand-seq

    Full text link
    Somatic structural variants (SVs) are widespread in cancer, but their impact on disease evolution is understudied due to a lack of methods to directly characterize their functional consequences. We present a computational method, scNOVA, which uses Strand-seq to perform haplotype-aware integration of SV discovery and molecular phenotyping in single cells by using nucleosome occupancy to infer gene expression as a readout. Application to leukemias and cell lines identifies local effects of copy-balanced rearrangements on gene deregulation, and consequences of SVs on aberrant signaling pathways in subclones. We discovered distinct SV subclones with dysregulated Wnt signaling in a chronic lymphocytic leukemia patient. We further uncovered the consequences of subclonal chromothripsis in T cell acute lymphoblastic leukemia, which revealed c-Myb activation, enrichment of a primitive cell state and informed successful targeting of the subclone in cell culture, using a Notch inhibitor. By directly linking SVs to their functional effects, scNOVA enables systematic single-cell multiomic studies of structural variation in heterogeneous cell populations

    Divergent mutational processes distinguish hypoxic and normoxic tumours

    Full text link
    Many primary tumours have low levels of molecular oxygen (hypoxia), and hypoxic tumours respond poorly to therapy. Pan-cancer molecular hallmarks of tumour hypoxia remain poorly understood, with limited comprehension of its associations with specific mutational processes, non-coding driver genes and evolutionary features. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumour types, we quantify hypoxia in 1188 tumours spanning 27 cancer types. Elevated hypoxia associates with increased mutational load across cancer types, irrespective of underlying mutational class. The proportion of mutations attributed to several mutational signatures of unknown aetiology directly associates with the level of hypoxia, suggesting underlying mutational processes for these signatures. At the gene level, driver mutations in TP53, MYC and PTEN are enriched in hypoxic tumours, and mutations in PTEN interact with hypoxia to direct tumour evolutionary trajectories. Overall, hypoxia plays a critical role in shaping the genomic and evolutionary landscapes of cancer

    Integrative pathway enrichment analysis of multivariate omics data

    Full text link
    Multi-omics datasets represent distinct aspects of the central dogma of molecular biology. Such high-dimensional molecular profiles pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple datasets using statistical data fusion, rationalizes contributing evidence and highlights associated genes. As part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we integrated genes with coding and non-coding mutations and revealed frequently mutated pathways and additional cancer genes with infrequent mutations. We also analyzed prognostic molecular pathways by integrating genomic and transcriptomic features of 1780 breast cancers and highlighted associations with immune response and anti-apoptotic signaling. Integration of ChIP-seq and RNA-seq data for master regulators of the Hippo pathway across normal human tissues identified processes of tissue regeneration and stem cell regulation. ActivePathways is a versatile method that improves systems-level understanding of cellular organization in health and disease through integration of multiple molecular datasets and pathway annotations
    corecore